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ABSTRACT 


This research project aims to develop an abstractive summarization system for Indian legal documents. The system leverages the 
power of the T5 transformer model, fine-tuned using Quantized Low-Rank Adaptation (QLoRA). The training data comprises two 
datasets, the Indian Legal Corpus (ILC) and IN-Abs, both containing court cases and their corresponding abstractive summaries. 


The system is designed to accept legal text input directly or extract it from uploaded DOCX or PDF documents. An initial 
extractive summary is generated using the bert-extractive-summarizer, which is subsequently fed into the fine-tuned T5 model to 
produce an abstractive summary. 


The principal result of this research is the successful implementation of a system capable of generating abstractive summaries of 
Indian legal documents. The system achieved a ROUGE-1 score of 46.37%, demonstrating its effectiveness. 


In conclusion, this research contributes to the field of legal document summarization by providing a system that can generate 
concise and coherent summaries, thereby aiding in the efficient comprehension of complex legal texts. This work also opens 


avenues for further improvements and applications in the legal tech domain. 
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INTRODUCTION 

The field of legal document summarization has seen significant 
advancements with the advent of transformer-based models. 
However, the complexity and specificity of Indian legal 
documents present unique challenges that necessitate specialized 
solutions. This study introduces an innovative approach to 
abstractive summarization of Indian legal documents, building 
upon recent advancements in transformer models and fine- 
tuning techniques. 


The purpose of this research is to develop a system capable of 
generating abstractive summaries from Indian legal documents. 
The system employs the TS transformer model, which has 
shown promising results in various natural language processing 
tasks. To adapt the model to the specific task and data, we 
use Quantized Low-Rank Adaptation (QLoRA), a fine-tuning 
technique that has demonstrated effectiveness in similar 
applications. 


The TS model is trained on two datasets, the Indian Legal 
Corpus (ILC) and IN-Abs, both of which contain court cases 
and their abstractive summaries. The use of these datasets 
ensures that the model is well-suited to handle the intricacies 
of Indian legal texts. 


The system accepts legal text directly or extracts it from 
uploaded DOCX or PDF documents. An initial extractive 


summary is generated using the bert-extractive-summarizer, 
which is then fed into the fine-tuned TS model to produce an 
abstractive summary. 


Theresults of this research indicate that the system can effectively 
generate abstractive summaries, achieving a ROUGE-1 score 
of 46.37%. This study contributes to the ongoing efforts in 
the field of legal document summarization and opens up new 
possibilities for future research and applications in the legal 
tech domain. 


MATERIALS AND METHODS 

T5, short for Text-to-Text Transfer Transformer, is a versatile 
neural network model developed by Google for various natural 
language processing tasks.[2] It operates on a text-to-text 
approach, converting both input and output into text?. This 
makes TS5 flexible for tasks like translation, summarization, 
sentiment classification, and more. T5 uses an abstractive 
summarizing algorithm, generating new sentences from the 
given text. It requires the text to be transformed into numerical 
form for training and inference. This powerful model has 
significantly impacted the field of NLP, offering a unified 
framework for diverse tasks. 
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"translate English to German: That is good." 


“cola sentence: The 
course is jumping well." 


“stsb sentencel: The rhino grazed 
on the grass. sentence2: A rhino 
is grazing in a field." 


“Das ist gut." 


“six people hospitalized after 


“summarize: state authorities 5 - 
a storm in attala county. 


dispatched emergency crews tuesday to 
survey the damage after an onslaught 
of severe weather in mississippi.* 


Figure 1: T5’s text-to-text framework 
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Figure 2: Proposed system 


This project employs the TS transformer model for the 
abstractive summarization of Indian legal documents. The 
model is fine-tuned using Quantized Low-Rank Adaptation 
(QLoRA), a method that adapts the pre-trained model to the 
specific task of summarizing legal documents. 


Datasets 

The model was trained on two datasets: the Indian Legal 
Corpus (ILC) and IN-Abs. Both datasets contain Indian court 
cases along with their abstractive summaries. 


Document Processing 

Users can input the legal text directly or upload it as a document 
in DOCX or PDF format. The text is then extracted from these 
documents and sent to the BERT Extractive Summarizer. 
[9] This summarizer generates an extractive summary, which 
serves as the input for our fine-tuned model. 


Model Fine-tuning and Summarization 

Quantized Low-Rank Adaptation (QLoRA) is an efficient 
fine-tuning approach that significantly reduces memory usage, 
enabling the fine-tuning of large models on a single GPU.[3] 
It backpropagates gradients through a frozen, 4-bit quantized 


pretrained language model into Low Rank Adapters (LoRA). 
QLoRA introduces several innovations to save memory without 
sacrificing performance, such as 4-bit NormalFloat (NF4), a 
new data type optimal for normally distributed weights, and 
double quantization to reduce the average memory footprint. It 
has been used to fine-tune more than 1,000 models, achieving 
state-of-the-art results. 
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Figure 3: Different finetuning methods and their memory 
requirements. QLORA improves over LoRA by quantizing 
the transformer model to 4-bit precision and using paged 
optimizers to handle memory spikes. 


The extractive summary is fed into the fine-tuned TS model, 
which generates the final abstractive summary. The fine- 
tuning process involves training the TS model on our datasets 
using QLoRA, which adapts the model to the specific task of 
summarizing legal documents. 


Evaluation 


The performance of the model was evaluated using the ROUGE- 1 
score, a common metric for evaluating summarization models. 
Our model achieved a ROUGE-! score of 46.37%, indicating 
a high level of accuracy in generating abstractive summaries. 


RESULTS AND DISCUSSION 

The project was designed with the aim of creating an abstractive 
summarization model for Indian legal documents. The TS 
transformer model was chosen for this task due to its proven 
effectiveness in text summarization tasks. The model was fine- 
tuned using Quantized Low-Rank Adaptation (QLoRA), a 
method that adapts the pre-trained model to the specific task of 
summarizing legal documents. 


The model was trained on two datasets: the Indian Legal Corpus 
(ILC) and IN-Abs. Both datasets contain Indian court cases 
along with their abstractive summaries. The training process 
involved optimizing the model parameters to minimize the loss 
function, which measures the difference between the model’s 
predictions and the actual summaries. The fine-tuned model 
achieved a training loss of 1.98. 


MODEL ROUGE-1 
T5 base on ILC & IN-Abs 8.01% 
Fine-tuned T5 on ILC & IN-Abs 46.37% 


Table 1: Results 
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The validation of the model was performed by testing it 
on unseen legal documents and comparing the generated 
summaries with the actual summaries. The model achieved a 
ROUGE-1 score of 46.37%, a significant improvement over the 
base T5 model, which achieved a ROUGE-1 score of 8.01% on 
the test set. 


The results demonstrate the effectiveness of the TS model in 
summarizing Indian legal documents when fine-tuned using 
QLoRA. The high ROUGE-1 score indicates that the model 
was able to generate summaries that closely match the actual 
summaries. 


The use of the BERT Extractive Summarizer to generate an 
extractive summary, which serves as the input for our fine- 
tuned model, proved to be an effective strategy. This approach 
allowed the model to focus on the most important parts of the 
document, thereby improving the quality of the abstractive 
summary. 


However, it’s important to note that while the model achieved 
a high ROUGE-1 score, there is still room for improvement. 
Future work could explore other fine-tuning methods or use 
additional datasets to further improve the model’s performance. 


CONCLUSION 

This research project successfully developed an abstractive 
summarization model for Indian legal documents using the 
T5 transformer model, fine-tuned with Quantized Low-Rank 
Adaptation (QLoRA). The model was trained on two datasets, 
the Indian Legal Corpus (ILC) and IN-Abs, both containing 
Indian court cases and their abstractive summaries. 


The unique approach of using the BERT Extractive Summarizer 
to generate an extractive summary, which was then used as 
input for the fine-tuned TS model, proved to be effective. This 
strategy allowed the model to focus on the most important 
parts of the document, thereby enhancing the quality of the 
abstractive summary. 


The modelachievedaROUGE-1 score of 46.37%, demonstrating 
its effectiveness in generating summaries that closely match the 
actual summaries. This is a significant improvement over the 
base T5 model, which achieved a ROUGE-1 score of 8.01% 
on the test set. 


While the results are promising, there is still room for 
improvement. Future work could explore other fine-tuning 
methods or use additional datasets to further enhance the 
model’s performance. This project lays a solid foundation for 
future research in the field of legal document summarization. It 
has the potential to significantly contribute to the efficiency and 
accessibility of legal proceedings, thereby having a profound 
impact on the legal system. 
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